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DATA-DRIVEN GOODNESS-OF-FIT TESTS. 

By Mikhail Langovo'v^ 

Hausdorff Research Institute for Mathematics, 

Bonn, Germany. 
We introduce a new general class of statistical tests. The class 
contains Neyman's smooth tests and data-driven efficient score tests 
as special examples. We prove general consistency theorems for the 
tests from the class. The paper shows that the tests can be applied 
for simple and composite parametric, semi- and nonparametric hy- 
potheses. Our tests are additionally incorporated with model selec- 
tion rules. The rules allow to modify the tests by changing the penalty. 
Many of the optimal penalties, derived in statistical literature, can 
be used in our tests. This gives a hope that the proposed approach 
is convenient and powerful for different testing problems. 



1. Introduction. Constructing good tests for statistical hypotheses is 
an essential problem of statistics. There are two main approaches to con- 
structing test statistics. In the first approach, roughly speaking, some mea- 
sure of distance between the theoretical and the corresponding empirical 
distributions is proposed as the test statistic. Classical examples of this ap- 
proach are the Cramer-von Mises and the Kolmogorov-Smirnov statistics. 
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More generally, L^— distance based tests, as well as graphical tests based on 
confidence bands, usually belong to this type. Although, these tests works 
and are capable of giving very good results, but each of these tests is asymp- 
totically optimal o nly in a f inite number of directions of alternatives to a 
null hypothesis (see 



NikitinI ([19951)). 

Nowadays, there is an increasing interest to the second approach of con- 
structing test statistics. The idea of this approach is to construct tests 
in such a way that the tests would be asymptotically optimal in some 
sense, or most powerful, at least in a reach enough set of directions. Test 



statistics constructed following this approa ch are often ca lled score test 



statistics. The 
examp le, 



a ioneer of this approac h was 



Wilks 



1938l'l. 



Bickel and Ritovl (11992) 



Cam 



ments and improve ments, and 



(1956L 



Nevn 



NeymanI (|l959l ) 



Ledwina 



Fan et al 



ani (Il937l'l. See also, for 



Cox and Hinklev 



1994) for subsequent dev elop 



mm, 



Bickel et al 



torn and 



Li and Liang] (j2007l ) for recent results in the field. This app roach is also 



closely related to the theory of effic i ent (a daptive) estimation 



Bickel et al. 



(11993), 



Ibragimov and Has^minskii 



(|l98ll ). Additionally, it was shown, at 



least in some basic situations, that data-driven score tests are asymptotically 
optimal in the sense of i ntermediate efficiency in an infinite number of direc- 



tions of alternatives (see 



Inglot and Ledwinal ( 



199fil'l) an d 



show good overall 



Kallenberg and Ledwina 



perfor mance in practice (jKallenberg and Ledwinal (|l995l ) 

mm ). 

Another important line of devel opment in the area of optimal testing 



concerns wi th minimax test ing, see 
testing, see 



Ingsterj (jl993l )). and adaptive minimax 



Spokoinvi (|l996l ). Those tests are optimal in a certain minimax 
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sense against wide classes of nonparametric alternatives. We do not dis- 
cuss minima x testi ng theory in this paper, except for Remark 1211b elow . See 



Bickel et al 



(|2006l ) for a recent general overview of this and other exist- 
ing theories of statistical testing, and a discussion of some advantages and 
disadvantages of different classes of testing methods. 

This paper attempts to generalize the theory of data-driven score tests. 
The classical score tests have been substantially generalized in recent statis- 
tical literature: see, for ex ample, the gener alized likelihood ratio statistics for 



nonpa rametric models in 



Fan et al 



(|200lh . tailor-made tests in 



Bicke 



et al 



( 2006 ) and the semiparametric generalized likelihood ratio statistics in 
(|2007l ). The situation is similar to the one in estimation theory: there is a 
classical estimation method based on the use of maximum likelihood equa- 
tions, and there is a more general method of M-estimation. 

In this paper we propose a generalization of the theory of data-driven 
score tests. We introduce the notions of NT- and GNT-tests, generalizing 
the concepts of Neyman's smooth test statistics and data-driven score tests, 
for both simple and composite hypotheses. The main goal of this paper is 
to give an unified approach for proving consistency of NT- and GNT-tests. 
Usually proofs of consistency for data-driven tests consists of two parts: 

1) establishing large deviation inequalities for the test statistic 

2) deriving consistency of the test from these inequalities. 

Our method gives the tool to pass through step 2 automatically. Addition- 
ally, the method allows a lot of freedom in the choice of penalties, dimension 
growth rates and flexibility in model regularity assumptions. 

The method is applicable to dependent data and statistical inverse prob- 



Li and Liang 
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lems. We conjecture (and provide some initial support for this claim in 
the paper) tha t both sein i - and nonp arametric geii e ralize d likelihood ratio 



stati stics from 
from Bickel et al 



Fan et al 



pOOlh and 



Li and Liang] (120071). score pr ocesses 



OwenI liasa), could 



(|2006l ). and empirical likelihood from 
be used to build consistent data-driven NT- and GNT-tests. 

Moreover, for any NT- or GNT-test, we have an explicit rule to determine, 
for every particular alternative, whether the test will be consistent against 
this alternative. This rule allows us to describe, in a closed form, the set of 
"bad" alternatives for every NT- and GNT-test. 

In Section [21 we describe the framework and introduce a class of SNT- 
statistics. In Section [3l we propose a general definition of a model selection 
rule. Section SI is devoted to the definition of NT-statistics. This is the main 
concept of this paper. In Section O we study behaviour of NT-statistics 
for the case when the alternative hypothesis is true, while in Section [6] we 
investigate what happens under the null hypothesis. In the end of Section 
El a consistency theorem for NT-statistics is given. Section [7] is devoted to 
some direct applications of our method. In Section [HI a new notion con- 
cerning the use of quadratic forms in statistics is introduced. This section 
is somewhat auxiliary for this paper. In Section [^ we introduce a notion of 
GNT-statistics. This notion generalizes the notion of score tests for compos- 
ite hypotheses. We prove a general consistency theorem for GNT-statistics. 

2. Notation and basic assumptions. Let Xi,X2,... be a sequence 
of random variables with values in an arbitrary measurable space X. Suppose 
that for every m the random variables Xi , . . . , X^ have the joint distribution 
Pm from the family of distributions Pm. Suppose there is a given functional 
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T acting from the direct product of the famihes 0$^^]^ = (IPi) IF'2) • • •) 
to a known set 0, and that J-{P\,Pi, . . .) = 9. We consider the problem of 
testing the hypothesis 

Hq: G Go C e 

against the alternative 

Ha: e G Bi = e \ Go 

on the basis of observations Yi, . . . , Yn having their values in an arbitrary 
measurable space Y (i.e. not necessarily on the basis of Xi, . . . , Xm)- 

Here G can be any set, for example, a functional space; correspondingly, 
parameter 6 can be infinite dimensional. It is not assumed that Yi, . . . , Yn 
are independent or identically distributed. The measurable space Y can be, 
for example, infinite dimensional. This allows to apply the results of this 
paper in statistics for stochastic processes. Additional assumptions on y/s 
will be imposed below, when it would be necessary. 

The exact form of the null hypothesis Hq is not important for us at this 
moment: Hq can be composite or simple, Hq can be about Y's densities or 
expectations, or it can be of any other form. The important feature of our 
approach is that we are able to consider the case when Hq is not about 
observable 1^'s, but about some other random variables Xi, . . . , X^a- This 
makes it possible to use our method in the case of statistical inverse prob- 
lems. Under some conditions (see Theorem [9]) it would be still possible to 
extract from 1^'s some information about X[s and build a consistent test. 
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Definition 1. Consider the following statistic of the form 



where n is the number of available observations Yi, . . . , Yn and li, . . . , Ik, 
li : Y ^ M, are some known Lebesgue measurable functions. We call the 
simplified statistic of Neyman's type (or SNT-statistic) . 

Here li, . . . ,1^ can be some score functions, as was the case for the classical 
Neyman's test, but it is possible to use any other functions, depending on 
the problem under consideration. We prove below that under additional 
assumptions it is possible to construct consistent tests of such form without 
using scores in ([T]). We will discuss different possible sets of meaningful 
additional assumptions on /i, . . . , below (see Sections [5] - [9]) . 

Scores (and efficient scores) are based on the notion of maximum likeli- 
hood. In our constructions it possible to use, for example, truncated, penal- 
ized or partial likelihood to build a test. In this sense, our theory generalizes 
the score tests theory, like M-estimation generalizes classical likelihood es- 
timation. It is even possible to use functions li, . . . , Ik such that they are 
unrelated to any kind of a likelihood. 



Example 1. Basic example of an SN T-statistic is t 



re Neyman's smoo th 



test statistic for simple hypotheses (see iNeymanI (j 19371 ) or iLedwinal (jl994l )). 
Let Xi, . . . , Xn be i.i.d. random variables. Consider the problem of testing 
the simple null hypothesis Hq that the X'^s have the uniform distribution on 
[0, 1]. Let {4>j} denote the family of orthonormal Legendre polynomials on 



imsart-aos ver. 2007/04/13 file: NT_A0S_2.tex date: February 1, 2008 



DATA-DRIVEN TESTS. 
[0, 1]. Then for every k one has the test statistic 



7 



k f n X 2 



i=l 



We see that Neyman's classical test statistic is an SNT-statistics. 



Example 2. Partial likelihood. 



Cox 



(119751 ) proposed the notion of partial 



likelihood, generalizing the ideas of conditional and marginal likelihood. Ap- 
plications of partial likelihood are numerous, including inference in stochas- 
tic processes. Below we give Cox's definition of partial likelihood and then 
construct SNT-statistics based on this notion. 

Consider a random variable Y having the density 6). Let Y be trans- 
formed into the sequence 

(2) {Xi, Si, X2, S2, ■ ■ ■ , Xm, Sm), 

where the components may themselves be vectors. The full likelihood of the 
sequence ([2]) is 



(3) 

m m 

i=i i=i 
where x*--'^ = {xi,...,Xj) and s^-'^ = (si,...,Sj). The second product is 
called the partial likelihood based on S in the sequence {Xj, Sj}. The partial 
likelihood is useful especially when it is substantially simpler than the full 
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likelihood, for example whe n it involves only the parameters of interest and 



not nuisance parameters. In 



Coxl (I1975I ) some specific examples are given. 



Assume now, for simplicity of notation, that 6 is just a real parameter 
and that we want to test the simple hypothesis Hq : 9 = 6q against some 
class of alternatives. Define for j = 1, . . . , m functions 



and cr| := var{tj). If we define Ij := tj/oj^ we can form the SNT-test statistic 

□ 

Consistency theorems for SNT-statistics will follow from consistency the- 
orems for more general NT-statistics that are introduced in Section HI See, 
for example. Theorem 1101 

Remark 1. There is a direct method that makes it possible to find 
asymptotic distributions of SNT-statistics, both under the null hypothesis 
and under alternatives. The idea of the method is as follows: one approx- 
imates the quadratic form (that has the form + . . . + Z|) by the 
quadratic form + . . . + N"^, where Ni is the Gaussian random variable 
with the same mean and covariance structure as Zj, i.e. the i—ih. component 
of Tfc. This approximation is possible, for example, if Z(y^)'s are i.i.d. random 
vectors with nondegenerate covariance operators and finite third absolute 
moments. Then the error of approximation is of order n~^/^ and depends on 
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Gotze and Tikhomirov 



the sm allest eigenvalue of the covariance of l{Yi). See 
(|l999l ). p. 1078 for more details. And the asymptotic distribution and large 
deviations of the quadratic form + . . . + has been studied extensively. 

3. Selection rule. Since it was shown that for applications of efficient 
score tests it is i mportant to select the r ight number of components in the 



test st a tistic (see 



Bickel and Ritov 



FanI dlOOa) 



Kallenberg 



1992), 



Eubank et al. 



Kallenberg and Ledwina 



(j2002l )). it is desirable to provide a cor- 



responding refinement for our construction. Using the idea of a penalized 
likelihood, we propose a general mathematical framework for constructing 
a rule to find reasonable model dimensions. We make our tests data-driven, 
i.e., the tests are capable to choose a reasonable number of components in 
the test statistics automatically by the data. Our construction offers a lot of 
freedom in the choice of penalties and building blocks for the statistics. A 
statistician could take into account specific features of his particular problem 
and choose among all the theoretical possibilities the most suitable penalty 
and the most suitable structure of the test statistic to build a test with 
desired properties. 

We will not restrict a possible number of components in test statistics 
by some fixed number, but instead we allow the number of components to 
grow unlimitedly as the number of observations grows. This is important 
because the more observations Yi, . . . , 1^ we have, the more information is 
available about the problem. This makes it possible to give a more detailed 
description of the phenomena under investigation. In our case this means 
that the complexity of the model and the possible number of components in 
the corresponding test statistic grow with n at a controlled rate. 
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Denote by a statistical model designed for a specific statistical problem 
satisfying assumptions of Section [2l Assume that the true parameter value 
9 belongs to the parameter set of M^, call it 0^. We say that the family of 
models Mk for /c = 1, 2, ... is nested if for their parameter sets it holds that 
©1 ^ ©2 ^ • • • • We do not require 0^s to be finite dimensional. We also do 
not require that all G^s are differe nt (this has a meaning in statistics: see 



the first remark on the page 221 of 



Birge and Massard (j2001 



))■ 



Let Tfc be an arbitrary statistic for testing validity of the model on 
the basis of observations Yi , . . . , 1^ . The following definition applies for the 
sequence of statistics {T^}. 

Definition 2. Consider a nested family of models M^. for A; = 1, ... , 
d{n), where d{n) is a control sequence, giving the largest possible model 
dimension for the case of n observations. Choose a function vr (•,•): N x N ^ 
M, where N is the set of natural numbers. Assume that 7r(l,n) < 7r(2,n) < 
. . . < TT{d{n),n) for all n and 7r(j, n) — 7r(l,n) ^ co as n ^ oo for every 
j = 2, . . . ,d{n). Call 7r(j, n) a penalty attributed to the j-th model Mj and 
the sample size n. Then a selection rule S for the sequence of statistics {T^} 
is an integer- valued random variable satisfying the condition 



(6) 5 = min{A; : I < k < d{n); Tk-TT{k,n) > Tj-7r{j,n), j = 1,.. ■,d{n)} . 

We call Ts a data-driven test statistic for testing validity of the initial model. 
The definition is meaningful, of course, only if the sequence {T^} is increasing 
in the sense that Ti(Yi, . . . , y„) < T2{Yi, . . . , y„) < . . . . 

In statistical literature, one usually tries to choose penalties such that they 
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possess some sort of minimax or Bayesian optimality. Classical examples of 
the penaltie s constru c ted vi a this approach are Schwarz's penalty 7r{j, n) = 



j lo gn (see 



see 



Schwara (|l978l )). and minimum description length penalties, 



RissanenI (119831') . For more exan i ples o f optimal penalties and rec ent 



development s , see lAbramovich et al 



Bunea et al 



20071 ^. 



Birge and MassartI (120011 ) or 



20071 ). In this paper, we do not aim for optimality of the pe- 
nalization; our goal is to be able to build consistent data-driven tests based 
on different choices of penalties. The penalization technic that we use in 
this paper allows for many possible choices of penalties. It seems that in our 
framework it is possible to use most of the penalties from the abovemen- 



tioned papers. As an illustration, see Example 3 below. 



Example 2 (continued). We have an interesting possibility concerning 
the statistic PLm- This statistic depends on the number m of components 
in the sequence Suppose now that Y can be transformed into sequences 
{Xi,Si), or (Xi,S'i,X2,52), or even {Xi, Si, X2, S2, ■ ■ ■ , X^,, Sm) for any 
natural m. If we are free to choose the partition number m, then which 
m is the best choice? If m is too small, one can loose a lot of information 
about the problem; and if m is too big, then the resulting partial likeli- 
hood can be as complicated as the full one. Definition [2] proposes a solution 
to this problem. The adaptive statistic PLs will choose a reasonable num- 
ber of components in the transformed sequence automatically by the data. □ 



xam ple 3 (Gaussian model selection). Birge and Massart in 



Birge and Massart 



(j200ll ) proposed a method of model selection in a framework of Gaussian 
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linear processes. This framework is quite general and includes as special 
cases a Gaussian regression with fixed design, Gaussian sequences and the 
model of Ibragimov and Has'minskii. In this example we briefly describe the 
construction (for more details see the original paper) and then discuss the 
relations with our results. 

Given a linear subspace S of some Hilbert space H, we call Gaussian linear 
process on S, with mean s G M and variance any process Y indexed by 
S of the form 

Y{t) = {s,t)+eZ{t), 

for alH G S, and where Z denotes a linear isonormal process indexed by S 
(i.e. Z is a centered and linear Gaussian process with covariance structure 
E[Z{t)Z{u)] = {t,u)). Birge and Massart considered estimation of s in this 
model. 

Let 5 be a finite dimensional subspace of S and set ^{t) = \\t\\^ — 2Y{t). 
One defines the projection estimator on S to be the minimizer of 7(t) with 
respect to t £ S. Given a finite or countable family {Sm}meM of finite 
dimensional linear subspaces of S, the corresponding family of projection 
estimators Sm, built for the same realization of the process Y, and given a 
nonnegative function pen defined on M , Birge and Massart estimated s by 
a penalized projection estimator s = s^, where rh is any minimizer with 
respect to m G of the penalized criterion 

crit{m) = — +pen{m) = j{sm) +pen{m). 

They proposed some specific penalties pen such that the penalized projection 
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estimator has the optimal order risk with respect to a wide class of loss 



functions. The method of model se 
the one of Birge and Massart j2001 



ection of this paper has a relation with 



In the model of Birge and Massart j{t) is the least squares criterion and 
Sm is the least squares estimator of s, which is in this case the maximum 
likelihood estimator. Therefore is the Neyman score for testing the 

h ypothesis 5 = with i n this model. Risk-optimizing penalties pen proposed 



m 



Birge and MassartI (120011 ) satisfy the conditions of Definition [2] (after the 
change of notations pen{m) = 7r(m,n); for the explicit expressions of pen's 
see the original paper). Therefore, is, in our terminology, the data- 

driven SNT-statistic. As follows from the consistency Theorem [9] below, 
Ijs^lP can be used for testing s = and has a good range of consistency. 

4. NT-statistics. Now we introduce the main concept of this paper. 
Suppose that we are under the general setup of Section O 

Definition 3. Suppose we have n random observations Yi, . . . , 1^ with 
values in a measurable space Y. Let A; be a fixed number and I = (/i, . . . , Z^) 
be a vector- function, where li : Y ^ M for i = 1, . . . ,k are some known 
Lebesgue measurable functions. We assume that Y-s and I'^s are as general 
as in Definition [H Set 



(7) L = {Eo[l{Y)Y 1{Y)} , 

where the mathematical expectation £^0 is taken with respect to Pq, and Pq 
is the (fixed and known in advance) distribution function of some auxilliary 
random variable Y, where Y is assuming its values in the space Y. Assume 
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that Eo 1{Y) 
finite. Put 



M. LANGOVOY 

= and L is well defined in the sense that all its elements are 



We call Tfc a statistic of Ney man's type (or NT-statistic). 

If, for example, Y/s are equally distributed, then the natural choice for 
Pq is their distribution function under the null hypothesis. Thus, L will be 
the inverse to the covariance matrix of the vector 1{Y). Such a constrac- 
tion is often used in score tests for simple hypothesis. But our definitions 
allow to use a reasonable substitution instead of the covariance matrix. This 
possibility can help for testing in a semi- or nonparametric case, where in- 
stead of finding a complicated covariance in a nonparametric situation one 
could use Pq from a much simpler parametric family, thus getting a reason- 
ably working test and avoiding a considerable amount of technicalities. Of 
course, this Pq will have to satisfy consistency conditions, but after that we 
get the consistent test regardless of the unusual choice of Pq. Consistency 
conditions put a serious restriction on possible Poi they are a mathematical 
formalization of the idea of how Pq should be connected to 1^'s. 



Example 2 (continued). It is possible to define by the formula ([8]) a 
version of the partial likelihood statistic PLm for t he ca. se when 9 is mul- 
tidimensional or even infinite dimensional. In ICoxl (Il975l ) it is shown that 
under additional regularity assumptions E{tj) = 0. In this case PL^ will be 
an NT-statistic (but not an SNT-statistic). 
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Example 4 (trivial). If for the SNT-statistic Tk defined by ([I]) additionally 
EqI{Y) = 0, then is obviously an NT-statistic. Therefore, in most situa- 
tions of interest the notion of NT-statistics is more general than the one of 
SNT-statistics. The first reason for introducing SNT-statistics as a special 
class is that for this special case there is a well-developed theory for finding 
asymptotic distributions of corresponding quadratic forms, and therefore 
there could be some asymptotic results and rates for SNT-statistics such 
that they are stronger than the corresponding results for NT-statistics (see 
Remark [T]) . The second reason is that there exist SNT-statistics of interest 
such that they are not NT-statistics. Though, they will not be studied in 
this paper. 



Example 5. Statistical inverse problems. The most well-known exam- 
ple here is the deconvolution problem. This problem appears when one has 
noisy signals or measurements: in physics, seismology, optics and imaging, 
engineering. It is a building block for many complicated statistical inverse 



pro blems. It is possib 
(see 



e to construct data-driven score tests for the problem 
Langovovl (l2007bl ^1. 
The problem is formulated as follows. Suppose that instead of Xi one 
observes Yi, where 

Yi — y^i -\- Ei^ 

and e-s are i.i.d. with a known density h with respect to the Lebesgue mea- 
sure A; also Xi and £i are independent for each i and E ei = Q < E < oo. 
Assume that X has a density with respect to A. Our null hypothesis Hq is the 
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simple hypothesis that X has a known density /q with respect to A. Let us 
choose for every k < d{n) an auxihary parametric family {/g}, G C M'^ 
such that /o from this family coincides with /o from the null hypothesis Hq. 
The true F possibly has no relation to the chosen {/e}- Set 



(/k fo{s)h{y- s)ds 



Jm /o(s) h{y-s)ds 
and define the corresponding test statistic Uk by the formula (El). Under 



approp riate regularity assumptions, Uk is an NT-statistic (see 



Langovov 



(12007a|)). 



Example 6. Rank Tests for Independence. Let (Xi, li), . . . , y„) 
be i.i.d. random variables with the distribution function D and the marginal 
distribution functions F and G for Xi and Yi. Assume that F and G are 
continuous, but unknown. It is the aim to test the null hypothesis of inde- 
pendence 



(10) Ho: D{x,y) = F{x)G{y), x,yeR, 

ag ainst a wide class of alternatives . The following construction was proposed 



m 



Kallenberg and Ledwinal (119991 ) 



Let bj denote the j— th orthonormal Legendre polynomial (i.e., 6i(x) = 
V3(2x - 1), h.(x) = V5 ( 6x'^ - 6x + l), etc.). The score test statistic from 



Kallenberg and Ledwina 



(119991) is 
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where Ri stands for the rank of Xi among Xi , . . . , X„, and Si for the rank of 
Yi among Yi, . . . , Yn- Thus defined Tk satisfies Definition [3] of NT-statistics: 
put 



(1) 7(2). 



Ri- 1/2 5,-1/2 



n 



n 



and lj{Zi) := bj{Z^^^) bjlZ^"^^). Under the null hypothesis = E^xk, and 
EqI{Z) = 0. Thus, Tk is an NT-statistic. New Zi depends on the original 
{XijYiYs in a nontrivial way, but still contains some information about the 
pair of interest. 

The selection rule proposed in lKallenberg and Ledwinal (jl999l ) to choose the 



number of components k in was 



(12) 



S = minjA; : 1 < A; < d{n); T^ — k log n > Tj —j log re, j = 1,2, . . . , d{n)} . 
This selection rule satisfies De f inition O and so the data-driven statistic T5 



from 



Kallenberg and Ledwinal (|l999l ) is a data-driven NT-statistic. □ 



5. Alternatives. Now we shall investigate consistency of tests based 
on data-driven NT-statistics. In this section we study the behavior of NT- 
statistics under alternatives. 

We impose additional assumptions on the abstract model of Section [2j 
First, we assume that Yi,Y2, . . . are identically distributed. We do not as- 
sume that Yi,Y2, . . . are independent. It is possible that the sequence of inter- 
est Xi,X2,. ■ ■ consists of dependent and nonidentically distributed random 
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variables. It is only important that the new (possibly obtained by a compli- 
cated transformation) sequence Yi,Y2, . . . obeys the consistency conditions. 
Then it is possible to build consistent tests of hypotheses about X'^s. The 
reason for this is that, even after a complicated transformation, the trans- 
formed sequence still can contain some part of the information about the 
sequence of interest. However, if the transformed sequence Yi,Y2, . . . is not 
chosen reasonably, then test can be meaningless: it can be formally consis- 
tent, but against an empty or almost empty set of alternatives. 

Let P denote an alternative distribution of Y-s. Suppose that Epl{Y) 
exists. Another assumption we impose is that liYiYs satisfy both the law of 
large numbers and the multivariate central limit theorem, i.e. that for the 
vectors liY\), . . . , l{Yn) it holds that 

1 

- ^(^i) ^ in ^ - probability as n — > oo, 

n 

(13) J2(^i(^Y,) - Ep 1{Y)) AA(0, L'^) , 

i=i 

where L is defined by d?]) and AA(0, L^^) denotes the A;— dimensional normal 
distribution with mean and covariance matrix L^^. 

These assumptions put a serious restriction on the choice of the function I 
and leave us with a uniquely determined Pq. In this paper we are not using 
the full generality of Definition [3l Nonetheless, random variables of interest 
Xi, . . . , Xn are still allowed to be arbitrarily dependent and nonidentically 
distributed, and their transformed counterparts Yi, . . . , Y^ are still allowed 
to be dependent. 
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Now we formulate the following consistency condition: 



(C) 



there exists integer K = K{P) > 1 such that 



Ep = 0, . . . , Ep Ik-i{Y) = 0, EplK = Cp^O 



where li, . . . , 1^ are as in Definition [3l 



We assume additionally (without loss of generality) that 



(14) lim d{n) 



= oo . 



Remark 2. Assumption ()14p describes the most interesting case. It is 
not very important from statistical point of view to include the possibility 
that d{n) is non-monotone. And the case when d{n) is nondecreasing and 
bounded from above by some constant D can be handled analogously to the 
method of this paper, only the proofs will be shorter. 

Let Ai > A2 > . . . > Afc be the ordered eigenvalues of L, where L is as 
in Definition [3l To avoid possible confusion in the statement of the next 
theorem, we have to modify our notations a little bit. We remind that in 
Definition [3 L is a x /c— matrix. Below we will sometimes need to denote it 
Lfc in order to stress the model dimension. Accordingly, ordered eigenvalues 

(k) (k) (k) 

of Lk will be denoted X\ ' > X'2 ' > ■ ■ ■ > Xl - We have the sequence of 
matrices {ifcj^i and each matrix has its own eigenvalues. When it will be 
possible, we will use the simplified notation from Definition [3l 
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Theorem 3. Let (C) and JT^ holds and 



, , , 7r(k,n) 
(15) lim sup ^ ,/ = . 

Then 



lim P(S >K) = l. 

n— »oo 

Remark 4. Condition p5|) means that not only n tends to infinity, but 
that it is also possible for k to grow infinitely, but at the controlled rate. 

Now suppose that the alternative distribution P is such that (C) is sat- 
isfied and that there exists a sequence {rn}'^=i such that lim„^oo rn = oo 
and 



(A) P[- 



n 



i=l 



>y] =o[-). 



Note that in {A) we do not require uniformity in y, i.e. r„ gives us the rate, 
but the exact bound can depend on y. In some sense condition {A) is a way 
to make the weak law of large numbers for /x(i^)'s more precise. As an 
illustration, we prove the next lemma. 

Lemma 5. Let IxiYiYs be bounded i.i.d. random variables with finite 
expectation and variance g . Then condition (^) is satisfied with — 
exp(ny^/2(T). 

Therefore, one can often expect exponential rates in (A), but even a much 
slower rate is not a problem. The main theorem of this section is 
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(16) (i(n) = o(r„) as n ^ oo . 
T/ien — >p oo as n ^ oo . 

6. The null hypothesis. Now we study the asymptotic behavior of 
data-driven NT-statistics under the nuh hypothesis. We need one more def- 
inition first. 

Definition 4. Let {T^} be a sequence of NT-statistics and S" be a 
selection rule for it. Suppose that Ai > A2 > • . . are ordered eigenvalues 
of L, where L is defined by ([7]). We say that the penalty ■7r{k,n) in S is of 
proper weight, if the following conditions holds: 



1. there exists sequences of real numbers {s(fe, n)}^^^-^ , {t{k,n)} 
such that 



00 

fc,n=l ' 



(a) 



s{k, n) 



lim sup jry- = , 

where {un}'^^i is some real sequence such that lim^^oo Un = 00. 
(b) lim„_+oo t{k, n) = 00 for every k >2 
limii._^Qot{k,n) = 00 for every fixed n. 

2. s{k, n) < iT{k, n) — 7r(l, n) < t{k, n) for all k, n 
3. 

7r(k,n) 
lim sup = ) 

where {mn}^^i is some real sequence such that lim„_>oo = 00. 
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For notational convenience, we define for I = {li, . . . ,lk) from Definition [3] 



1 

(17) 'ir-=-j:h(Yi)^ 

ft _, 

1=1 

(18) I:= (Ii,l2,... Jfc) 

and, using the notation L from Definition [3l a quadratic form 

(19) Qfe(I) = (Ii,72,...,Ifc)L(Ii,l2,...,Ifc)^. 

The first reason for the new notation is that = Qk(J), where is the 
statistic from Definition O It is more convenient to formulate and prove 
Theorem [7] below using the quadratic form Qk rather than itself. And 
the main value of introducing Qfc will be seen in Section [HI where Qk is the 
central object. 

Below we use the notation of Definitions [3] and HI 

Definition 5. Let S be with a penalty of proper weight. Assume that 
there exists a Lebesgue measurable function (/?(•, •) : M x M ^ M, such that 
(p is monotonically decreasing in the second argument and monotonically 
nondecreasing in the first one, and assume that 

1. (B2) for every e > there exists K = Kf, such that for every n > n(e) 

ip{k;s{k,n)) < e, 

k=Ke 

where {un}'^^i is as in Definition [H 

imsart-aos ver. 2007/04/13 file: NT_A0S_2.tex date: February 1, 2008 



DATA-DRIVEN TESTS. 23 

2. (B) 

Po(nQfc(I) >y) < v{k;y) 

for all A; > 1 and y G [s(/c, n); t(/c, n)] , where Pq is as in Definitional 

We call if a proper majorant for (large deviations of) the statistic T5. Equiv- 
alently, we say that (large deviations of) the statistic Ts are properly majo- 
rated by if. 

To prove consistency of a test based on some test statistic, usually it is 
required to use some large deviations inequality for this test statistic. NT- 
statistics are no exception from this. In order to prove consistency of an 
NT-test, one has to choose some specific large deviations inequality to use 
in the proof. Part of the model regularity assumptions and the rate d{n) 
will be determined by this choice. Without a general consistency theorem, 
if one would like to use another inequality, the proof of consistency should 
be started anew. 

In our method it is easier to prove different types of consistency theo- 
rems. Sometimes, it is desirable to have a better rate d{n) by the cost of 
more restrictive regularity assumptions, arising from the use of a strong 
probabilistic inequality; sometimes, it is better to use a simple inequality 
that requires less regularity assumptions, but gives worse rate d{n). The 
meaning of Definitions U] and [S] and Theorem [U] below is that one can be sure 
in advance that whatever inequality he chooses, he will succeed in proving 
a consistency theorem, provided that the chosen inequality satisfies condi- 
tions (B) and (B2). Moreover, once an inequality is chosen, the rate of d{n) 
is obtained from Theorem [9l 
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Some of the previously published proofs of consistency of data-driven tests 
relied heavily on the use of Prohorov's inequality. For many test statistics 
this inequality can't be used to estimate the large deviations. This is usually 
the case for more complicated models where the matrix L is not diagonal. 
This is typical for statistical inverse problems and even for such a basic 
problem as the deconvolution. 

Theorem 7. Let {Tj^} he a sequence of NT- statistics and S he a selection 
rule for it. Assume that the penalty in S is of proper weight and that large 
deviations of statistics are properly majorated. Suppose that 

(20) d{n) < mm{un , m„} . 

Then S = Op^{l) and Ts = Op^{l). 

Remark 8. In Definition we need s{k,n) to be sure that the penalty 
TT is not "too light", i.e. that the penalty somehow affects the choice of 
the model dimension and protects us from choosing a "too complicated" 
model. In nontrivial cases, it follows from (B2) that s{k,n) ^ cxd as A; — > oo. 
But t{k, n) is introduced for the reason of statistical sense. Practically, the 
choice of t{k, n) is dictated by the form of inequality (B) established for the 
problem. Additionally, one can drop assumptions 1 and 3 in Definition [3] and 
still prove a modified version of Theorem [71 But usually it happens that if 
the penalty does not satisfy all the conditions of Definitions H] and El then 
Ts has the same distribution under both alternative and null hypotheses 
and the test is inconsistent. Then, formally, the conclusions of Theorem [71 
holds, but this has no statistical meaning. 
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Now we formulate the general consistency theorem for NT-statistics. We 
understand consistency of the test based on Ts in the sense that under the 
null hypothesis Ts is bounded in probability, while under fixed alternatives 
Ts' ^ CO in probability. 

Theorem 9. Let {Tk} be a sequence of NT-statistics and S be a selection 
rule for it. Assume that the penalty in S is of proper weight. Assume that 
conditions (A), and il5\) are satisfied and that d{n) = o{rn), d{n) < 
min{u„,m„}. Then the test based on Ts is consistent against any (fixed) 
alternative distribution P satisfying condition (C). 

7. Applications. As the first application, we have the following result. 

Theorem 10. Let {Tj^} be a family of S NT- statistics and S a selection 
rule for the family. Assume that Yi, . . . , Yn are i.i.d.. Let El(Yi) = and 
assume that for every k the vector (/i(Kj), . . . , /^(Kj)) has the unit covariance 
matrix. Suppose that ||(Zi(yi), . . . , < M{k) a.e., where \\ ■ \\k is the 
norm of the k— dimensional Euclidean space. Assume Ti{k,n) — 7r(l,n) > 2k 
for all k > 2 and 



(21) hm M(din)).^din),n) ^ ^ 



n 



Then S = Op,,{l) and Ts = Op,,{l). 



Example 1 (continued). As a simple corollary, we derive the fol 



theorem that slightly generalizes Theorem 3.2 from 



Kallenberd ([2002) 



owmg 
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Theorem 11. Let Ts be the Neyman's smooth data-driven test statis- 
tic for the case of simple hypothesis of uniformity. Assume that 7r(/c, n) — 
7r(l, n) > 2k for all k > 2 and that for all k < d{n) 



^.^ d(n)7r(d(n),n) ^ ^ 



Then S = Op^{l) and Ts = Op^{l). 



Proof. It is enough to note that in this case M{k) = ^/{k — l)(/c + 3) 
and apply Theorem [TOl □ 

Remark 12. In my point of view, the precise rate at which d{n) tends 
to infinity is not crucial for many practical applications. 



Example 5 (continued). In lKahenberg and Ledwinal (|l999l ) the following 



consistency result was established. 

Theorem 13. Suppose that d{n) = o{{j^]^^'^^). Let P he an alterna- 
tive and let F and G he the marginal distribution functions of X and Y 
under P. Let 



(22) E^hj{F{X))hj{G{Y))^Q 

for some j. If d{n) — > oo, then Ts ^ oo 
is consistent against ¥). 

For this problem, our condition (C) 



as n —> oo when P applies (i.e. Ts 



is equivalent to the following one: 
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(23) Ert„4^^^'W{^l^)^0. 



For continuous F and G (I23p is asymptotically equivalent to (I22p since both 
F{X) and are distributed as U[0, 1] and 

n n 
We see that Theorem E] is applicable to get a result similar to Theorem 1131 

We do not go into technical details here. □ 

8. Quadratic forms of P-type. Now we introduce another notion, 
concerning quadratic forms. 

Definition 6. Let Zi, Z2, ■ ■ ■ , Zn be identically distributed (not neces- 
sarily independent) random vectors with k components each. Denote their 
common distribution function by F. Let Q he a k x k symmetric matrix. 
Then Q{x) := xQx^ defines a quadratic form, for x G M'^. We say that Q{x) 
is a quadratic form of Prohorov's type (or just P— type) for the distribution 
F, if for some {s{k,n)}'^^^-^ , {t{k,n)}'^^^^^ satisfying (Bl) it holds that for 
all k, and for all y G [s{k,n);t{k,n)] 



(24) P^(nQ( ^^ + ^^ + --- + ^" -j^^Zi) >y) < ^ik;y), 
with if being a proper majorant for Pp and of the form 



(25) ^{k; y) = Ci ^k) X2, ■ ■ ■ , Xk) expj - , 
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where Ai, A2, . . . , Afc are the eigenvalues of matrix Q, and Ci, C2 are uniform 
in the sense that they do not depend on y, k, n. We will shortly say that 
Q{x) is of P— type for Z[s. 

We have the following direct consequence of Theorem • 

Corollary 14. Suppose that for Tg condition {A) holds, L is of P-type 
for the distribution function of the vector {{li(Yi), hiYi), ■ ■ ■ , lkiYi))}^=i <ind 
that the penalty in S is of proper weight. Then the test based on Ts is con- 
sistent against any alternative P satisfying (C). 

li Zi, Z2, ■ ■ ■ , Zfi are i.i.d. and Q is a diagonal positive definite matrix, then 
Q{x) is of P-type because of the Prohorov inequality. Definition [6] is meant to 
incorporate all the cases when Prohorov's inequality or some of its variations 
holds. Thus, Definition El is just some specification of the general condition 
(B) from Theorem [71 The definition is useful in the sense that it shows 
which kind of majorating functions <p could (and typically would) occur in 
condition (B) when. 

The simple sufficient condition for L to be of P-type is not known. But 
there is a method that makes it possible to establish P-type property in many 
particular situations. This method consists of two steps. On the first step, 
one approximates the quadratic form Q(l{Y)) by the simpler quadratic form 
Q{N), where is the Gaussian random variable with the same mean and co- 
variance structure as l (Y). This approximation is po ssible, for example, un- 



der co nditions given in 



Bentkus and Gotzd (|l996l ) or 



Gotze and Tikhomirov 



(|l999l ). These authors gave the rate of convergence for such approxima- 



tion. Then the second step is to establish a large deviation result for the 
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quadratic form Q{N); this form has a more predictable distribution. For 
stron gly dependent random v ariables, one can hope to use some technics 



from 



Horvath and Shad (|l999l l 



On the side note, many of the conditions for the existence of such ap- 
proximation of Q(l{Y)) are rather technical and specific on the structure of 
L. For example, sometimes assumptions on the 5 largest eigenvalues of L 
can be required. See the above papers by Gotze, Bentkus, Tikhomirov and 
references therein. 

9. GNT-statistics. The notion of NT-statistics is helpful if the null 
hypothesis is simple. However, for composite hypotheses it is not always 
possible to find a suitable L from Definition [3l Therefore the concept of NT- 
statistics needs to be modified to be applicable for composite hypotheses. 
The following definition can be helpful. 

Definition 7. Suppose we have n random observations Yi, . . . , Yn as- 
suming values in a measurable space Y. For simplicity of presentation, as- 
sume they are identically distributed. Let A: be a fixed number and / = 
(li, . . . , /fc) be a vector-function, where li : Y ^ M for i = 1, . . . ,k are some 
(maybe unknown) Lebesgue measurable functions. Set 



(26) L^^^ = {Eo[l{Y)fl{Y)} . 

where the expectation Eq is taken w.r.t. Pq, and Pq is (possibly unknown) 
distribution function of Y's under the null hypothesis. Assume that EqI(Y) = 
and that L^^^ is well-defined in the sense that all of its elements are finite. 
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Let Lfc denote, for every k, a k x k symmetric positive definite (known) 
matrix with finite elements such that for the sequence {L^} it holds that 



(27) ||Lfc-LW||=op„(l). 

Let /]■,...,/* be sufficiently good estimators of l(Yi), . . . , l{Yn) with respect 
to Pq in the sense that for every e > 



(28) P, 



where 



1=1 



> e 



as n — > oo , 



denotes the Euclidian A;— norm of a given vector. Set 



We call GTk a generalized statistic of Neyman's type (or a GNT-statistic). 
Let the selection rule S satisfy Definition [3l We call GTs a data-driven 
GNT-statistic. 

Remark 15. Now it is not obligatory to know functions li, . . . , Ik ex- 
plicitly (in Definition E] we assumed that we know those functions). It is 
only important that we should be able to choose reasonably good L and 
lys. Definition [7] generalizes the idea of efficient score test statistics with 
estimated scores. 

Remark 16. Establishing (I28p in parametric problems is usually not 
difficult and can be do ne if a ^/n— consis tent estimate of the nuisance pa- 



rameter is available (see 



LangovoyI (j2007al )). In semiparametric models, find- 



ing estimators for the score function that satisfy (I28p is more difficult and 
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not always possible, but there exist effective methods for cox istructing suc h 



estima t es. Of t en the sample sp litting technic is helpful. See 



Schick 



Schick! IiHSTI) 



19M), 



KlaassenI (jl987l ) for general results related to the topic. See 



also Example 10 below. 

Example 7 (trivial) . If Yi , . . . , Yn are equally distributed and is an 
NT-statistic, then is also a GNT-statistic. Indeed, put in Definition [7] 
L := L(0) and l*{Yi, . . . ,Yn) := IjiYi). 



Example 8. Let Xi, . . . X„ be i.i.d. random variables with density f{x). 
Consider testing the composite hypothesis 



^o: f{x)G{fix;(3),(3GB}, 



where B C a nd {f{x](3),(3 £ B} is a, given family of densities. In 



Inglot et al. 



()l997l ) , the data-driven score test for t esting Hn was cons t ructe d 



using score test for composite hypotheses from 



Cox and HinklevI (|l974l ). 



19971 ). Let F be 



Here we briefly describe the construction from 



the distribution function corresponding to / and set 



i=l 

with j depending on the context. Let / be the k x k identity matrix. Define 



dl3, 



.,q;j=l,...,k 
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Let /? denotes the maximum likelihood estimator of /? under Hq. Then the 
score statistic is given by 



(30) Wkm = nY^mi+Rmynif3). 



As follows from the results of 



Cox and HinklevI jmi ). Section 9.3, pp.323- 



324, in a regular enough situation Wfc(/3) satisfies Definition [7] and is a GNT- 



statistic. Practical 



Inglot et al 



(|l997l l. 



ly useful sets of such regularity assumptions are given in 



Example 9. Consider the problem described in Example 5, but with the fol- 
lowing complication introduced. Suppose that the density /i of e is unknown. 
The score function for {9,r]) at (^0)%) is 

(31) ko,voiy) = (k(y)> ^w(y)) , 

where l^^ is the score function for 6 at Oq and /^^ is the score function for rj 
at TjQ, i.e. 



(32) ieoiy) 



(/r fe{s)hno{y - s)ds 



=00 



[y:g(y;{0o,riQ))>O] > 



Ir feo{s)hr,o{y - s)ds 
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( /r /eo(s) hr,{ y - s) ds) I ^ 
The Fisher information matrix of parameter (^,r/) is 



(34) 7(0, r?) = / 

where Ge^rtin) is the probabihty measure corresponding to the density g {y ; {6, rj)). 
Let us write I(9Q,r]o) in the block matrix form: 

,,,, , ( In{Oo,Vo) h2{0o,Vo) ^ 

(35) I{9o,m) = 

where /ii(6'o,r7o) = Ee^m^'^Joo, h2{do,Vo) = Ee^,m^'^Jm^ ^"^^ analo- 
gously for l2i{0o,Vo) and l22{9o,Vo)- The efficient score function for in 
this model is 

(36) logiy) = ieoiy) - h2{do,Vo) 122(00, Vo)imiy) , 
and the efficient Fisher information matrix for 9 is 

(37) = Ee„r,J*e^ll = f ll{yfleo{y)dG0„M- 

The efficient score test statistics for composite deconvolution problem is 
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This is a GNT-statistics if the estimators satisfy (|271) and (|28l) . See 
(j2007bl ) for more details. 



Langovoy 



xam ple 10. The following semiparametric example belongs to 



Inglot and Ledwina 



pood ). Let Z = {X, Y) denote a random vector in IxM, / = [0, 1]. We would 



like to test the null hypothesis 



where X and e are independent, Ee = 0, E < oo, /? G R"? a vector of 
unknown real valued parameters, v{x) = {vi{x), . . . ^Vq{x)) is a vector of 
known functions. Suppose X has an unknown density /, and e an unknown 
density / with respect to Lebesgue measure A. 
Choose some real functions ui{x),U2{x), .... Set 

r{z) = r{x,y) :=- 

+-[y - v{x)P'^][mi - m2V~'^M], 

T 

where 



L 
J 



(y - v{x)l5^ 



[u{x)-v{x)V-^M] + 



mi=Egu{X), m2 = Egv{X), m= (mi, 771,2), 



w{x) = {u{x),v{x)), u{x) = u{x) — mi, v{x) = v{x) — m2, 
while M and V are blocks in 
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W 




hj.E,[u,{X)f[w{X)] + ^ 
4 r 



where J = J(/) = /^^ I/^dA(2/). Finally set 



then the efficient score statistic is 

where /*(•) is an estimator of /*, while L is an estimator of L. Inglot and 
Ledwina proposed, under additional regularity assumptions on the model, 
certain estimators for these quantities such that conditions (I27p and psp 
are satisfied. Therefore, becomes a GNT-statistic and its asymptotic 
properties can be studied by the method of this paper. □ 

Remark 17. In general, it seems to b e possible to use t he idea of a 



1, 



Bickel et al 



((20061) in order to 



score process and some other technics from 
construct and analyze NT- and GNT-statistics. This can be seen by the 
fact that such applications as in Examples 6 and 8 naturally appear in both 
papers. The difference with the above paper would be that we prefer to use 
test statistics of the form (|29p rather than integrals or supremums of score 
processes. 

In semi - and non parametric models, generalized likelihood ratios from 



Fan et al. 


(2001 


) and 


Li and Lians 


--I r 

(2007) 



of empirical likelihood, could also be a powerful tool for constructing NT- 
and GNT-statistics. 
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A general consistency theorem for GNT-statistics is required. Without 
a general consistency theorem, one has to perform the whole proof of con- 
sistency anew for every particular problem. This becomes difficult in cases 
where sample splitting, complicated estimators and infinitedimensional pa- 
rameters are involved. Therefore, in my opinion, for most of the semi- and 
nonparametric problems general consistency theorems are the most conve- 
nient tool for proving consistency of data-driven NT- and GNT-tests. If one 
has a general consistency theorem analogous to Theorem [9] for data-driven 
NT-statistics, then at least some consistency result will follow automatically. 

Now we prove consistency theorems for data-driven GNT-statistics. First, 
note that Definitions H] and [5] are also meaningful for a sequence of GNT- 
statistics {GTfc}, if only instead of L we use in Definition H] and in (jl9p the 
matrix L^^^ from Definition [71 To be technically correct in the statement 
of the next theorem, we introduce the auxiliary random variable Rk that 
approximates the statistic of interest GT^ : 



Definition 8. We would say that the penalty in the data-driven test 
statistic GTs is of proper weight, if this penalty is of proper weight for Rs 
in the sense of Definition HI We say that GTs is properly majorated, if Rs is 
properly majorated in the sense of Definition \5\ 

Due to conditions ()27l) and psp from the definition of GNT-statistics, this 
definition serves just for purposes of formal technical correctness. 

Theorem 18. Let {GT^} be a sequence of GNT-statistics and S be a 
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selection rule for it. Assume that the penalty in S is of proper weight and 
that large deviations of GT^ are properly majorated. Suppose that d{n) < 
min{u„,m„}. Then under the null hypothesis it holds that S = Opq(1) and 
GTs = Op,{l). 

To ensure consistency of GTg against some alternative distribution P, it 
is necessary and sufficient to show that under P it holds that GTs oo 
in P— probability as n — > oo. There are different possible additional sets of 
assumptions on the construction that make it possible to prove consistency 
against different sets of alternatives. For example, suppose that 



(38) (CI) WL-L'^^m =op(l) 

and that Z*, . . . , Z* are sufficiently good estimators of l{Yi), . . . , l{Yn) with 
respect to P, i.e. that for every e > 



(39) ^"(^ jy^j - ^(^i)) > ^) ^ as n ^ oo . 

These assumptions are very strong: they mean that the estimators, plugged 
in GTfc, are not only good at one point Pq, but that the estimators also 
possess some globally good quality. 

Theorem 19. Let {GT^} be a sequence of GNT-statistics and S he a 
selection rule for it. Assume that the penalty in S is of proper weight. Assume 
that conditions {A), ( [i^[ ) and \15\) are satisfied and that d[n) = o{rn), d{n) < 
min{ii„,m„}. Then the test based on Ts is consistent against any (fixed) 
alternative distribution P satisfying (C), (CI) and ^39\) . 
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Remark 20. Substantial relaxation of assumptions ([38l) and (f39l) should 
be possible. Indeed, these assumptions ensure us not only that GTs — > oo, 
but also that GTs Rs under P, where Rs is as in Definition [HI This is 
much stronger than required for our purposes, since for us GTs — > oo is 
enough and the order of growth is not important for proving consistency. 



Remark 21. In the literature on nonparametric testing, some authors 
consider the number of observations n tending to infinity and alternatives 
(of specific form) that tend to the null hypothesis at some speed. For such 
alternatives, some kind of minimax rate for testing can be established. 
The hardness of the testing problem, and the efficiency of the test, ca n 



be measured by this rate. S ee, for example, 



Ingster and Suslina 



(120031) 



Abramovich and Helled (120051 ). 



Spokoinyl (jl998l ). We do not consider rates for 



testing in this paper, but it is possible to consider local alternatives in this 
general setup as well. For example, minimax optimality of the penalized like- 
lihood estim^torSjjn_aj;ath^^ penalties, was stud- 



ied in 



Abramovich et al 



(120071). In 



Fan et al 



2001 



), it was shown that, for 



certain class of statistical problems, the gener alized likelihood ratio statis tics 
achieve optimal rates of convergence given in 
our case, this remains to be investigated. 



Ingster and Suslina 



(12003). In 
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Appendix. 



Proof. (Theorem [3]) . By the law of large numbers, as n — > oo , 



(40) - Y.^K{Yi) -^P Cp^O. 
We get 




Tk - 7r{K, n) > nX^ ■ (- ^ /^(y,)) - vr(K 



n 



1=1 



nXP{C], + op{l)CK) -7r{K,n) 



n 



XWCl+op{n\f) -n{K,n), 



and, because K and Ck are constants determined by fixed P, condition (115 
yields 



(42) Tk — TT{K,n) -^p oo as n ^ oo . 
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On the other hand, by (fT3 



where is a (K — 1)— dimensional multivariate normal distribution with 
the expectation vector equal to zero. This implies that = Op(l) for all 
k = 1,2, . . . , K — 1, because 



1=1 



Af)Op(l) = Op(l) 



and X^i \ X^^\ . . . , \[^'^ are constants and K < oo. Now by 



lim ^ P (Tfc - 7r(/c, n)>TK- tt{K, n)) = . 



fc=i 



But for d{n) > K 



P{S<K)< J2P {n - <k, n)>TK- vr(K, n)) , 



k=l 



and the theorem follows. 



□ 



Because of assumption (^4) we can prove the following lemma. 



Lemma 22. 



P 



1=1 



< 



O — . 



Proof. Denote x,, 



Ck- Obviously, 2;„ ^ as n — > cxd. We have 



and remember that by (C) we have Ep IxO^i) 



^ J2 lK{Yi) <Xn\ = P(-Xn <^Y. ^^O^i) < ) 
i=l i=l 
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/ 1 " \ 

= Pl-Xn-CK < - - EplK{Yi)) <Xn-CK\. 

i=l 

Here we get two cases. First, suppose Ck > 0. Then we continue as follows: 

P(-Xn-CK<- - EplK{Yi)) <Xn-CK) 

^ i=l ^ 



< P(- J2i^K{Yi) - EplK{Yi)) <Xn-CK] 
/ 1 " 

P[ - T^i^KiYi) - EplKiYi)) 
^ i=i 



< 



>\Xn - Ck\ 



(for all n > some tik) 



< P 



n 



^{iKiYi} - EplKiYi)) 



by (^), and so we proved the lemma for the case Ck > 0. In case if Ck < 0, 
we write 

/ 1 " \ 

P[-Xn-CK < - Y.i^K{Yi) - EplK{Yi)) <Xn-CK] 
^ ^ i=l ^ 



< P(^ J2MYi) - EplK{Yi)) > -xn -Ck) 

i=l 



and then we proceed analogously to the previous case. 



□ 
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Proof. (Lemma [5]) We will use Sloane's asymptotic expansion for the 
standard normal distribution function <I> : for x — > oo 

^{x) = 1 - (2^)-i/2 exp(-xV2)(x~^ + o{x-^)). 
From this expansion and the CLT it follows that 



P[- J2[^KiYi)-EplK{Yi)] >y 



i=l 



^ 1 ^ lKiYi)-EplK{Yi) ^ 



1=1 



a 



1 P( ^ J2 ^K{Y^-EplK{Yi) ^ yjn_ 



n 



1 - ^{y^ja) 
-i/2_^ 



a 



a 



(2vr) 



1 ny^- 

y^J n V 2 0"^/ 



and we see that r„ = exp(ny^/2o") is even more than enough. 



□ 



Proof. (Theorem [6]) . Let x > 0. Since Tj > Tk if j > K and ([II]) holds, 
we get by Theorem [3] that 



d{n) 

P{Ts<x) = 5] P(r,- < X, 5 = j) + 0(1) 

j=K 

< d{n) P{Tk < x) + o{l) 

2 



= d{n)P 



T.^k{Y,) 



i=l 



+ 0(1) 



Now by Lemma [22] and ()16p we get 
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P{Ts <x) =0 



( 



d{n) 



) 



+ o{l) 



o(l). 



n 



□ 

Proof. (Theorem [7]). liS>K, then -Ti > 7r(/c, n) -7r(l, n) for some 
K < k < d{n) and so, equivalently, 



for some K < k < d{n), where / = {li,l2, ■ ■ ■ ,lk)- We can rewrite ()43p in 
terms of the notation (fT7|) - (fT9]) as fohows: 

_ _ _ _ rp 

(44) {^/n h,..., \pn h) L {^/n h, . . . , ^pn h) 

1 2 

— — — — T Tl l\ 

= n{li,...,lk)L{li,...,lk) > -——^ + {n{k,n) -7r(l,n)), 

for some K < k < d{n). Denote A{k,n) := TT{k,n) — 7r(l,n); then with the 
help of p9|) we rewrite (|44|) as 





(45) nQk(l) > A{k,n) + 



n h 



Eq li 



2 ' 



for some K < k < d{n). Clearly, 
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Po{S>K) < Pn((li3l) holds for some K <k< d(n)) 
= Poi^B holds for some K <k < d{n)) 
< Po{nQk(i) > A{k,n) for some K <k < d{n)) 

But now by condition (B) we have 



Pq{S >K) < Po{nQk(i) > A{k,n) for some K <k< d{n)) 

d(n) 

< 5] Po nQfc(0 > A(A;,n) 



k=K ^ 

d{n) 

(46) < 5] (/,(A;;A(A;,n)), 

k=K 

if only d{n) < min{ii„,m„} (see Definition U]) . Thus, because of the Condi- 
tion (B), for each e > there exists K = such that for all n > n(e) we 
have Po{S >K)<e, i.e. S = Op„(l). 

Now, by standard inequalities, it is possible to show that Ts = Opg(l). 
Let us write for an arbitrary real t > 



imsart-aos ver. 2007/04/13 file: NT_A0S_2.tex date: February 1, 2008 



48 



M. LANGOVOY 



Po{\Ts\ > t) = E ^odT^I >t-S 



m=l 

d(n) 

+ Po{\Tm\>t]S = m) 

m=Ke+l 
Ke d{n) 

< J2 ^ + E Po{S = m) 

m=l m=K^+l 

= E ^o(|7;n| >t) + Po{S >K, + l) 

m=l 

< E ^o(|7;ni >t) + e 

m=l 

=: R{t)+e. 

For t — > oo we have Po(|?m| > ^ for every fixed m, so ^ as 
t — > oo. Now it follows that for arbitrary e > 



limPo{\Ts\>t) <e, 

t— >oo 

therefore 

lh^Po{\Ts\>t) = 

t— >oo 

and 

lim Poi\Ts\ >t) = 0. 

This completes the proof. □ 

Proof. (Theorem [9]) . Follows from Theorems O [6] and [7] and our defini- 
tion of consistency. □ 
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In the next proof we will need the following theorem from 



Prohorov 



(11973) 



Theorem 23. Let Zi, Z„ be i.i.d. random vectors with values in 
M^. Let EZi = and let the covariance matrix of Zi be equal to the identity 
matrix. Assume \\Zi\\k < L a.e. Then, for 2k < y'^ < nL~'^, we have 



where < 7/„ < Lyn^^/'^. 

Proof. (Theorem I lOp The SNT-statistic Ts is an NT-statistic with L^ = 

(k) (k) 

Ekxk and A-^ = . . . = = 1. Therefore Theorem [7] is applicable. Put (in 
Theorem!?]) s{k,n) = \/ 2k, t{k,n) = ^/nM{k)^^. The Prohorov inequality 
is applicable if M{k) 7r{k,n) < ^J~n and M'^{k) Tr{k,n) < n for all k < 
d{n); therefore assumption (f2T|) guarantees that the Prohorov inequality is 
applicable and, moreover, that (B) holds with 

Since (p is exponentially decreasing in y under (j2ip . it is a matter of simple 
calculations to prove that (B2) is satisfied with Un = d{n) for any sequence 
{d{n)} such that holds. 

□ 
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Proof. (Theorem llSp . Consider the auxihary random variable 



This is not a test statistic, but formally this random variable satisfies Def- 
inition [3l Therefore Theorem [7] is applicable for Rk- Since under the null 
hypothesis GT^ — > and GTs Rs in Pq— Probability by Definition [TJ 
we get the statement of the theorem by the Slutsky lemma. □ 

Proof. (Theorem I19p . Consider the random variable Rk defined in the 
proof of Theorem [THJ Theorems [3l [6] and [7] are valid for the random variable 
Rs- Under the assumptions of the theorem, GTs — > Rs in -P— probability, 
and we get the statement of the theorem by the Slutsky lemma. □ 
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